Detecting Phishing Emails the Natural Language Way
نویسندگان
چکیده
Phishing causes billions of dollars in damage every year and poses a serious threat to the Internet economy. Email is still the most commonly used medium to launch phishing attacks [1]. In this paper, we present a comprehensive natural language based scheme to detect phishing emails using features that are invariant and fundamentally characterize phishing. Our scheme utilizes all the information present in an email, namely, the header, the links and the text in the body. Although it is obvious that a phishing email is designed to elicit an action from the intended victim, none of the existing detection schemes use this fact to identify phishing emails. Our detection protocol is designed specifically to distinguish between “actionable” and “informational” emails. To this end, we incorporate natural language techniques in phishing detection. We also utilize contextual information, when available, to detect phishing: we study the problem of phishing detection within the contextual confines of the user’s email box and demonstrate that context plays an important role in detection. To the best of our knowledge, this is the first scheme that utilizes natural language techniques and contextual information to detect phishing. We show that our scheme outperforms existing phishing detection schemes. Finally, our protocol detects phishing at the email level rather than detecting masqueraded websites. This is crucial to prevent the victim from clicking any harmful links in the email. Our implementation called PhishNet-NLP, operates between a user’s mail transfer agent (MTA) and mail user agent (MUA) and processes each arriving email for phishing attacks even before reaching the
منابع مشابه
Community Targeted Spam: A Middle Ground Between General Spam and Spear Phishing
Looking at today spam and phishing panorama, we are able to identify two diametrically opposed approaches. On the one hand we have general spam, which targets as much as people as possible with generic and pre-formed texts; on the other hand we have very specific emails, handcrafted to target high-value targets. While nowadays these two worlds don’t intersect at all, we envision a future where ...
متن کاملDetecting Known and New Salting Tricks in Unwanted Emails
Spam and phishing emails are not only annoying to users, but are a real threat to internet communication and web economy. The fight against unwanted emails has become a cat-and-mouse game between criminals and people trying to develop techniques for detecting such unwanted emails. Criminals are constantly developing new tricks and adopt the ones that make emails pass spam filters. We have devel...
متن کاملBreaching the Human Firewall: Social engineering in Phishing and Spear-Phishing Emails
We examined the influence of three social engineering strategies on users’ judgments of how safe it is to click on a link in an email. The three strategies examined were authority, scarcity and social proof, and the emails were either genuine, phishing or spear-phishing. Of the three strategies, the use of authority was the most effective strategy in convincing users that a link in an email was...
متن کاملArtificial Immune System Based Classification Approach for Detecting Phishing Mails
Phishing/Spam is an attack that deals with social engineering methodology to illegally acquire and use someone else’s data on behalf of legitimate website for own benefits. Phishing emails are messages designed to fool the recipient into handing over personal information, such as login names, passwords, credit card numbers, account credentials, social security numbers etc. Fraudulent emails har...
متن کاملConcept Learning from Natural Language Interactions
Humans can efficiently learn about new concepts using natural language communications. For example, a human can learn the concept of a phishing email from natural language explanations such as ‘phishing emails often request your bank account number’. On the other hand, purely inductive learning systems typically require a large collection of labeled data for learning such a concept. If we wish ...
متن کامل